-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore compatibility with docopt 0.6.2 docstrings #36
Conversation
0a2429a
to
b008bf1
Compare
Just pushed a change to import |
Related issue in the docopt repo: docopt/docopt#259 |
assert parse_section("usage:", "foo bar fizz buzz") == [] | ||
assert parse_section("usage:", "usage: prog") == ["usage: prog"] | ||
assert parse_section("usage:", "usage: -x\n -y") == ["usage: -x\n -y"] | ||
assert parse_section("usage:", usage) == [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h4l It looks like the constant usage
isn't used anymore, should be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I'll get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- remove unused usage constant
docopt/__init__.py
Outdated
""" | ||
usage_pattern = r""" | ||
# Any number of lines precede the usage section | ||
\A(?P<before_usage>(?:.*\n)*?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the ?
redundant in the *?
? can't we just have *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so this is a bit tricky. I thought the same looking at the original regexes docopt used, I assumed +?
could just be *
. But the ?
acts as a modifier to make the *
non-greedy . So the effect here is that we'll match as few lines as possible before a usage:
line occurs.
Without the non-greedy operator it'd match as many lines as possible. This is actually still OK for correctness, as the match will initially consume all the lines, then backtrack until it finds a usage:
line to match the usage section with, and succeed from there. However the effect of that is that the final usage: line in the doc will be used. With the non-greedy match, the first usage: section gets used.
So it only matters if a doc incorrectly has multiple usage sections. There is some code that tries to check for multiple usage sections, and raises an error if there are multiple (that's lint_docstring
in this version). I just tried with the standard greedy match, and lint_docstring
currently fails to notice the second usage section in this case, as it currently looks only in the usage body and the section after usage, not in the section before.
The same effect as the non-greedy match can be achieved in this case using a negative lookahead assertion in this before_usage
group to not allow it to match lines containing usage:
. That seems like a more clear approach, as non-greedy matching is pretty obscure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that makes sense. I think you're right, the negative lookahead is more clear, otherwise I'd want a comment here, and it would be great to not have a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- make usage_pattern regex more intuitive by using lookahead instead of non-greedy match
docopt/__init__.py
Outdated
# Any number of lines precede the usage section | ||
\A(?P<before_usage>(?:.*\n)*?) | ||
# The `usage:` section header. | ||
^(?P<usage_header>.*usage:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably overkill, but perhaps .*\busage:
so that it won't match "ingredients in pork sausage:"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll do it.
""", | ||
], | ||
) | ||
def test_parse_docstring_sections__reports_invalid_docstrings(invalid_docstring: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a test case for usage:
being present, but nothing after that?
"""
program summary
usage:
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, that test is now in test_lint_docstring
.
docopt/__init__.py
Outdated
# They can be indented with whitespace | ||
[ \t]* | ||
# The description itself starts with the short or long flag (-x or --xxx) | ||
(-\S+?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, don't we want -\S+
? We need one or more non-whitespace after the -
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story as the other with the non-greedy modifier, but looking at this again I don't think it's necessary. I think the repetition itself isn't needed. I'll get rid of it.
@h4l sorry I just left a few thoughts but I don't think I'm done with a review yet. Let me look at this a bit more before you go and rework things, in case I find anything else |
Oh ok, no probs. No rush. |
tests/testcases.docopt
Outdated
-i, --interactive interactive picking | ||
-p, --patch select hunks interactively | ||
""" | ||
@pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be $ prog --interactive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always pass because no test case is actually generated.
Really, we need to fix conftest.py
Lines 14 to 27 in 88496b8
def parse_test(raw: str): | |
raw = re.compile("#.*$", re.M).sub("", raw).strip() | |
if raw.startswith('"""'): | |
raw = raw[3:] | |
for fixture in raw.split('r"""'): | |
doc, _, body = fixture.partition('"""') | |
cases = [] | |
for case in body.split("$")[1:]: | |
argv, _, expect = case.strip().partition("\n") | |
expect = json.loads(expect) | |
prog, _, argv = argv.strip().partition(" ") | |
cases.append((prog, argv, expect)) | |
yield doc, cases |
Something like
if len(cases) == 0:
raise DocoptTestException(f"Each example must have at least one test case starting with '$'. Example: {doc}")
Do you want to add this? I can add in followup commit as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, oops. That @pytest.mark.xfail
line is a mistake too — I was playing with allowing marking cases with pytest marks (to allow xfailing cases, or tagging cases to run a subset) but then I pulled that out, but forgot to remove that line. And seems like I butchered the lines around there somehow.
I can put in a check for that, yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some error handling to conftest based on your suggestion, should be more robust against these kind of mistakes now! It also reports details of which test fails if there's a JSON syntax error in an test too.
@@ -950,8 +950,35 @@ local options: --baz | |||
other options: | |||
--egg | |||
--spam | |||
-not-an-option- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't delete this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so the reason I got rid of this is that with docopt 0.6.2 this is actually a valid way to specify an option. In docopt-ng currently, option descriptions have to have leading whitespace, but not in 0.6.2.
I could put it back with a different name, like
-option-without-whitespace
although the "docopt 0.6.2 compatibility" test below also has options without leading whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha. I guess it follows the line starting with - or --
rule, so you're right the other tests cover this case.
But, separate from the leading whitespace thing, I'm still hung up on the
-starts-with-single-dash-but-is-long, that isn't intuitive to me if that should be a long or short option.
Testing:
-long-solo
-s-d --single--double
--double-single -d-s
results in
{"-long-solo": false, "--single--double": false, "--double-single": false}
which I think makes sense, but probably should add a test case to make it official. Can add in follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that there was only one dash on that option. The option description parsing code is very lenient, it just assumes the doc is formatted according to the spec, and does almost no validation. The spec does have quite a clear definition of short vs long option in general (not specifically for option descriptions):
Words starting with one or two dashes (with exception of "-", "--" by themselves) are interpreted as short (one-letter) or long options, respectively.
- Short options can be "stacked" meaning that -abc is equivalent to -a -b -c.
And the usage pattern / argv parsers (the Pattern
classes) do implement this behaviour (slight quirk with the hyphen):
>>> docopt.docopt('usage: prog [-single-dash]', [])
{'--': False, '-a': False, '-d': False, '-e': False, '-g': False,
'-h': False, '-i': False, '-l': False, '-n': False, '-s': 0}
And the argv parser/matcher doesn't recognise the the single dash option:
>>> docopt.docopt('usage: prog [options]\noptions: -single-dash\n', '-single-dash')
usage: prog [options]
options: -single-dash
An exception has occurred, use %tb to see the full traceback.
I'd suggest following the behaviour implemented in the Pattern code, as it is significantly more detailed/careful than than the option default parsing code.
tests/test_docopt.py
Outdated
def test_issue_126_defaults_not_parsed_correctly_when_tabs(): | ||
section = "Options:\n\t--foo=<arg> [default: bar]" | ||
assert parse_defaults(section) == [Option(None, "--foo", 1, "bar")] | ||
assert parse_options(section) == [Option(None, "--foo", 1, "bar")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just add this as a test case to the cases below? Before, we didn't have this "unit" testing for parsing options, so it made sense to have this as a separate test. But now we do have very similar tests in test_parse_options
, so just put this case in there too. Having it separate implies it is somehow special, but I think this is just another instance of those other cases we test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- remove redundant test
test_issue_126_defaults_not_parsed_correctly_when_tabs
(include its example in main options test)
@@ -950,8 +950,35 @@ local options: --baz | |||
other options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you say in one of the commit messages:
Note that docopt 0.6.2 recognises option descriptions in the text
prior to the usage section, but this change does not, as it seems like
an unintended side-effect of the previous parser's implementation, and
seems unlikely to be used in practice.
We need to change the documentation in the README that says
Every line in doc that starts with - or -- (not counting spaces) is treated as an option description, e.g.
Then we should add a test for this:
# lines before the usage section that start with --
# are not treated as options.
r"""My CLI program
--speed is ignored as an option
usage: prog --direction
"""
# Note that "--speed" is not present
$ prog --direction
{"--direction": true}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, docopt.org says "Every line that starts with "-" or "--" (not counting spaces) is treated as an option description, e.g.:" so I think we want to stick with that. People are going to be reading that spec, and it's so stable I don't want to mess with it.
That does really throw a wrench in this implementation. What do you think, re-work this implementation to allow for --options to be scattered anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a problem to allow that. We just need to include sections.before_usage
in the string that gets passed to parse_options()
. I think you're right, better to stick to the 0.6.2 behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Include content in
before_usage
when parsing option descriptions
docopt/__init__.py
Outdated
# The first line of the body may follow the header without a line break: | ||
(?: | ||
# Some non-whitespace content | ||
[ \t]*\S.*(?:\n|\Z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we simplify (?:\n|\Z)
to [\n\Z]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can't, the \Z magic doesn't work in character classes
Codecov Report
@@ Coverage Diff @@
## master #36 +/- ##
==========================================
- Coverage 99.79% 99.79% -0.01%
==========================================
Files 1 1
Lines 492 487 -5
==========================================
- Hits 491 486 -5
Misses 1 1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This implements Nick's suggestion from jazzband#36. When generating tests from testcases.docopt, we now raise an error if an example doc has no testcases following it to actually exercise the example. Also, tests with invalid JSON testcases are reported using the same error reporting mechanism.
CI should pass now, I fixed the 3.7/3.8 compatibility issue. I should have covered everything, but need to discuss the |
Oh, also, fixing the |
The simple fixes are not-yet-squashed fixup commits, I can rebase/squash to cleanup the history before merging as and when we're ready. |
"""\ | ||
\t --foo ARG, -f ARG With a long | ||
|
||
wrapped description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h4l what actually terminates an --option item? eg
--foo Intended description.
Now I'm back to describing the usage of my program
in general. I include [default: 42] on accident
(OK this is very contrived)
should --foo have a default of 42? To me it looks like that option description ends on the single line and so it should not have that default.
Should we have the same behavior as when the usage: section terminates? eg the option description starts when we see r"(?: )|$"
(as we currently have), and then continues for any number of indented lines?
Oh but then this needs to be modified to "continues for any number of indented lines as long as they don't start with - or --"...yuck
This isn't really solving a likely problem, so we could just leave it unspecified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think you answered it yourself, but the description continues until a line starts with [ \t]-\S
. The previous and current code just splits the entire option description text block on lines starting in that way, and everything in between becomes part of the description for the option. So it's not at all nuanced in its handling of unusual stuff.
As for if the default should apply, I've certainly written option descriptions which extend over several lines. But I would keep them indented. Seems reasonable to end a description on a non-indented line, but it'd certainly require some thought.
Like I was mentioning in the other thread, the option description parsing code is really lenient. There was talk of defining a grammar for the docopt language in another issue — even if the implementation continued to use the hand-written parser, it'd be useful to have a grammar for the specification value of what is/isn't valid.
tests/test_docopt.py
Outdated
pytest.param("", id="empty"), | ||
pytest.param("This is a prog\n", id="1line"), | ||
pytest.param( | ||
"This is a prog\n\n" "Info:\n Blah blah\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird little " "
hole in the middle of this constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah strange, probably due to Black formatting a wrapped string.
- Fix split string literal
Option("-e", None, 0, False), | ||
], | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- All of these are the "happy cases" where there truly are options. We should add cases for non-options, such as
some text --foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I've added another case to test this.
@h4l those fixups look great! Just a few more tiny requests. The large one ("what terminates an --option?") can be a separate PR if you want to just get this in (My head is starting to hurt trying to keep all of these changes in my head) but the small ones should be included in this PR. Squashing the fixups would be nice, but if it's a pain then I'm fine with them staying in there. |
@NickCrews Great, thanks for all your help reviewing this, it's really helpful. And sorry to make things trickier with that fix. That's the "fix: handle options with descriptions on new line" commit, right? I'd be happy to take it out for another PR. The only trouble is that one of the testcases.docopt tests fails because of it — the
We could take out the fix and mark this test as xfail to correct later. FWIW the change shouldn't actually change the definition of how far an option description (the part with the What's happening is that But the same general issue can occur on master: In [1]: from docopt import parse_defaults
In [2]: parse_defaults("""
...: options:
...: -f
...: BLAH BLAH --other-option
...: """)
Out[2]: [Option('-f', '--other-option', 1, None)] I don't think this fix should affect backwards compatibility, as the initial part this affects was always intended be on a single line, and putting them on multiple lines only "works" using this contrived syntax (the |
This implements Nick's suggestion from jazzband#36. When generating tests from testcases.docopt, we now raise an error if an example doc has no testcases following it to actually exercise the example. Also, tests with invalid JSON testcases are reported using the same error reporting mechanism.
@NickCrews I fixed those two things. Let me know what you'd like to do with the option defaults newline fix. I pushed a separate branch to my repo with the fixups squashed out as an example: |
It was throwing an error due to the options: section occurring in usage:, but the test was passing because the message of the exception was not checked.
parse_docstring_sections() divides the docstring into the main non-overlapping sections that are to be further analysed. These are text before the usage, the usage header ("usage:" or similar), the usage body (" proc [options]") and text following the usage. The intention of this partitioning is to facilitate restoring the option description parsing behaviour of the last stable docopt release (0.6.2), while retaining the current improved behaviour of not requiring a blank line after the usage: section; and also removing a small quirk in the 0.6.2 parser, which is that option-defaults can occur before the usage: section. (But this partitioning provides access to the text before the usage section, so this behaviour could be retained if desired.)
parse_options() parses option descriptions in a way that is compatible with docopt 0.6.2, while also compatible with docopt-ng's current behaviour. The differences are: - docopt-ng requires option descriptions to be in an "options:" section, docopt allows them to be anywhere outside the "usage:" section. - docopt-ng requires options descriptions have leading whitespace, docopt allows them to start at column 0. - docopt-ng allows options descriptions to begin on the same line as a section heading, docopt does not. e.g. `options: --foo` is OK with docopt-ng. parse_options() parses options following either docopt or docopt-ng's behaviour. Although it expects the `docstring` argument to be a sub-section of the overall docstring, so the caller is in control of where in the docstring options are parsed from.
This commit uses parse_docstring_sections() and parse_options() to parse docstrings accepted by docopt 0.6.2, while retaining docopt-ng's improvements to supported syntax. Currently, docopt-ng parses option-defaults using a strategy that was in docopt's master branch, but considered unstable by the author, and was not released in docopt. It looks for option descriptions in an "options:" section, which is ended on the first blank line. This has the side-effect that options defined in a man-page style — with blank lines in-between — are not found. Neither are options outside an options: section (docopt allows options to follow the usage with no section heading). parse_docstring_sections() is used to separate the usage section from the rest of the docstring. The text before the usage is ignored. The usage body (without its header) is parsed for the argument pattern and the usage header with its body is used to print the usage summary help. The text following the usage is parsed for options descriptions, using parse_options(), which supports option the description syntax of both docopt and the current docopt-ng. Note that docopt 0.6.2 recognises option descriptions in the text prior to the usage section, but this change does not, as it seems like an unintended side-effect of the previous parser's implementation, and seems unlikely to be used in practice. The testcases have two cases added for docopt 0.6.2 compatibility. This fixes jazzband#33
This implements Nick's suggestion from jazzband#36. When generating tests from testcases.docopt, we now raise an error if an example doc has no testcases following it to actually exercise the example. Also, tests with invalid JSON testcases are reported using the same error reporting mechanism.
This fixes a pre-existing bug in docopt-ng that wasn't previously triggered by a test. Options in the options: section without a description on the same line and with content on the following line (e.g. a line-wrapped description) would be parsed as if the content on the following line was part of the option name & arguments. Option.parse() now ends the option name & arguments section on the end of the line, not just on two consecutive spaces.
It's a bit of an obscure and non-obvious regex feature. The regex uses a negative lookahead assertion now, which is still a little odd, but it's clearer I think.
The usage section heading now needs to have a word break before usage:, so e.g sausage: doesn't match as a usage: section.
lint_docstring() now checks for the usage_body being empty, and fails with a message indicating this.
parse_docstring_sections() is no longer responsible for rejecting docs with empty usage sections. Previously it did so to some extent, but failed to reject whitespace-only usage bodies. lint_docstring() now rejects empty usage bodies, which allows the regex's usage_body group to be simplified. The removed test for empty usage bodies is tested via test_lint_docstring.
Docopt 0.6.2 allows option descriptions anywhere in the doc, not just after the usage: section. With this change, we parse both the section prior to the usage: and the section after for option descriptions.
This adds another case for test_parse_options demonstrating how parse_options() parses docs which contain both options and text that mentions options that should not be interpreted as options. This includes an example (the '-b' option) where we do currently parse an option mentioned in prose as a real option. In principle we could fix this, but it's here for now to document current behaviour. And it's not hard to work around when writing a usage doc - just need to adjust where a line is wrapped.
1cdb833
to
abf13dd
Compare
@NickCrews I've updated this PR's branch to the squashed & rebased version (same as #38) so you can merge here if you're happy. |
This implements Nick's suggestion from #36. When generating tests from testcases.docopt, we now raise an error if an example doc has no testcases following it to actually exercise the example. Also, tests with invalid JSON testcases are reported using the same error reporting mechanism.
Brilliant, thank you so much @h4l! I'll close #38 as it was superseded by this. OK, so now I'm trying to summarize what were the changes, to determine if version number should be bumped as breaking, and what to put in changelog:
Am I missing anything? Woudl you describe this any differently? |
Fantastic, thanks for your help with this @NickCrews, I really appreciate it, especially as this turned out to be quite a significant change! I'd want to frame the changes in the context of restoring compatibility with 0.6.2, to make sure people understand why the behaviour has changed. This also implies the changes shouldn't be backwards incompatible/breaking for people who were using original docopt. Can also link to #33 which was fixed by this PR.
If you want to be very strict, in the PR we also stopped allowing usage section headings without a regex word break before While rebasing the other PR I notice we've also fixed #9 here. That could be phrased as:
|
Because it's still v0, under semver, the version bump could be minor if you'd rather not do v1 yet. |
Awesome, thanks for helping me wrap my head around it. I'll bump from 0.8.1 to 0.9.0 |
Great, thanks for handling that! |
This is my proposal to fix #33. (With the assumption that compatibility with docopt is a goal of docopt-ng.)
This PR adds two new functions,
parse_docstring_sections()
andparse_options()
; and uses them to parse docstrings accepted by docopt 0.6.2, while retaining docopt-ng's improvements to supported syntax.Currently, docopt-ng parses option-defaults using a strategy that was in docopt's master branch, but considered unstable by the author, and was not released in docopt. It looks for option descriptions in an "options:" section, which is ended on the first blank line. This has the side-effect that options defined in a man-page style — with blank lines in-between — are not found. Neither are options outside an options: section (docopt allows options to follow the usage with no section heading).
parse_docstring_sections() is used to separate the usage section from the rest of the docstring. The text before the usage is ignored. The usage body (without its header) is parsed for the argument pattern and the usage header with its body is used to print the usage summary help. The text following the usage is parsed for options descriptions, using parse_options(), which supports option the description syntax of both docopt and the current docopt-ng.
Note that docopt 0.6.2 recognises option descriptions in the text prior to the usage section, but this change does not, as it seems like an unintended side-effect of the previous parser's implementation, and seems unlikely to be used in practice.
The testcases have two cases added for docopt 0.6.2 compatibility.
The first commit ("Fix test for missing arg before --") could be merged separately, but my final commit's tests depends on it.